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Abstract 

The state of the art approach for reducing complexity in software 
development is to use abstraction mechanisms of programming 
languages such as modules, types, higher-order functions etc. and 
develop high-level frameworks and domain-specific abstractions. 
Abstraction mechanisms, however, along with simplicity, introduce 
also execution overhead and often lead to significant performance 
degradation. Avoiding abstractions in favor of performance, on the 
other hand, increases code complexity and cost of maintenance. 

We develop a systematic approach and formalized framework 
for implementing software components with a first-class special- 
ization capability. We show how to extend a higher-order functional 
language with abstraction mechanisms carefully designed to pro- 
vide automatic and guaranteed elimination of abstraction overhead. 

We propose staged evaluation as a new method of program stag- 
ing and show how it can be implemented as zipper-based traver- 
sal of program terms where one-hole contexts are generically con- 
structed from the abstract syntax of the language. 

We show how generic programming techniques together with 
staged evaluation lead to a very simple yet powerful method of iso- 
morphic specialization which utilizes first-class definitions of iso- 
morphisms between data types to provide guarantee of abstraction 
elimination. 

We give a formalized description of the isomorphic specializa- 
tion algorithm and show how it can be implemented as a set of term 
rewriting rules using active patterns and staged evaluation. 

We implemented our approach as a generic programming 
framework with first-class staging, term rewriting and isomorphic 
specialization and show in our evaluation that the proposed ca- 
pabilities give rise to a new paradigm to develop domain-specific 
software components without abstraction penalty. 

Categories and Subject Descriptors D.3.3 [Programming Lan- 
guages] : Language Constructs and Features 

Keywords Generic programming; polytypic programming; stag- 
ing; multi-stage programming; domain-specific languages; DSL; 
specialization; isomorphisms 

1. Introduction 

Most modern software development is done in high-level lan- 
guages, reducing program complexity through their built-in ab- 
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straction mechanisms (such as module systems, classes, interfaces, 
etc.). These mechanisms are often used to create domain-specific 
languages (DSLs) which allow a higher level of abstraction for pro- 
grams in a given domain (e.g. Spark 1 33] can be considered as a 
DSL for distributed programming). 

However, these mechanisms generally also introduce execution 
overhead (often called abstraction regret 0, I21TI or abstraction 
penalty) and the trade-off between abstraction and performance is 
often difficult. 

Modern advances in compilation techniques, such as just-in- 
time compilation and whole program optimization generally can't 
eliminate the overhead completely and don't scale well with the 
size of the program. 

A recent trend is development of DSL-centric frameworks 
where abstractions can be introduced and software can be built 
without abstraction penalty |5, 23, 24], though it may require de- 
velopment of special tools [J]. 

In such frameworks, DSL compilers allow mapping of problem- 
specific abstractions directly to low-level architecture-specific pro- 
gramming models such as 1121, 1201. However, the development of 
DSLs is difficult by itself, and adding a compilation stage consid- 
erably increases this difficulty. 

While compiling DSLs is a promising approach, we believe that 
there is still much to be done in tackling the abstraction penalty 
problem. 

In particular, using multiple DSLs together in a single applica- 
tion is a well-known problem 13011 . Existing DSL-centric frame- 
works usually require additional efforts for integration and inter- 
operation of multiple DSLs. 

Another problem is rapid prototyping of new DSLs and appli- 
cations developed with DSLs. While there is evidence that even 
simple techniques can lead to significant benefits 1231 . we think 
that this problem requires a more generic and systematic approach. 

In general, we believe that the problems of popular parallel 
programming |2] and of abstraction overhead are two sides of the 
same coin. In other words, a generic solution of the latter will also 
lead to a generic solution of the former. 

The present work originated from an attempt to apply the 
staging 1 32] approach proposed in Lightweight Modular Staging 
(LMS) 1 22] to the domain of nested data parallelism (NDP). 

The NDP implementation in Haskell 1 8] led to various Haskell 
extensions (such as support for non-parametric polymorphism (31) 
and new transformation techniques 1 18]. It was also noticed |6] that 
NDP could have a generic programming formulation. 

We implemented NDP as a polytypic library in Scala first t27ll . 
and then as a deep embedding 1 28] in the spirit of LMS. These 
experiments demonstrated that staging can have some non-trivial 
interaction with polytypic (or generic) programming. In this paper, 
we show how we can combine these directions, with useful results 
both for staging and for generic programming. 
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In the present paper, we advocate that it is useful to have first- 
class declarations of isomorphisms between data types to imple- 
ment domain-specific compilers. In particular, we observe that iso- 
morphisms can serve as a bridge between two levels of indirection 
(abstraction layers) in user-defined types. This helps in translating 
program code from higher levels down to some core language, per- 
forming specialization along the way. 

The way we define and use isomorphisms is where our work 
is different from previous approaches |14]. We are not inferring 
isomorphisms but instead we require a programmer to think about 
isomorphic representations of domain objects in his/her applica- 
tion. We require the isomorphisms to be explicitly specified and 
captured in the application domain. On the surface of the program- 
ming language we relate specifications of isomorphisms to declara- 
tions of alternative concrete representations of abstract types. This 
is just one of the possible implementations at the front-end of the 
language and should not be considered as a limitation of the pre- 
sented approach. 

The key point is that after the isomorphisms are identified in the 
application domain, they play an important role interacting with 
primitives of the core language. Rewriting rules capture this inter- 
action between isomorphisms and primitives. For each polymor- 
phic primitive of the core language there are rules which tell how 
the primitive composes with isomorphisms. The core language it- 
self can be thought of as another domain-specific language. In this 
paper we use the Array type to represent such a domain specifically, 
but any other set of types and core primitives can be used as well. 

In Sections [2] and |4] we explicitly consider the case of multiple 
concrete implementations for an abstract type. This is the key point 
where non-trivial interaction of staging and generic programming 
happens. 

We define the notion of staged evaluation to explicitly connect 
staging to the evaluation semantics of the source language. This se- 
mantics implements a virtual method invocation mechanism asso- 
ciated with inheritance. The staged evaluation process mimics this 
dynamic invocation while producing a graph representation of the 
source program. 

Thus, if we perform staged evaluation of a function call like 
mvm(new DenseMatrC . .), vec) with an instance of a con- 
crete matrix type we get a different program graph from the one 
produced by evaluating mvm(new SparseMatr ( . . . ) , vec). 
This is a consequence of presence of virtual method calls in the 
semantics of the source language. 

Thus motivated and inspired by our previous results, we develop 
a systematic approach and a new specialization technique for im- 
plementing domain-specific abstractions in a generic programming 
framework with first-class staging, rewriting and specialization ca- 
pabilitiesQ Our approach is based on a combination of generic pro- 
gramming and staging. 

In particular we present the following main contributions: 

1. We describe a new method of program staging (we call it staged 
evaluation) and show how it naturally arises from the evaluation 
semantics of the language to be staged. Our staging algorithm 
transforms program terms into directed acyclic graphs (DAGs) 
as intermediate representation (IR). We show how staged eval- 
uation can be described as a zipper-based Hal traversal of pro- 
gram terms where one-hole contexts 1 19] are generically con- 
structed from evaluation reduction contexts. 

2. We show how generic programming techniques together with 
staged evaluation lead to a very simple yet powerful method of 
isomorphic specialization which utilizes first-class definitions 
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of isomorphisms between data types to guarantee abstraction 
elimination. 

3. We give a formal description of the isomorphic specialization 
algorithm and show that it can be implemented as a set of 
graph rewriting rules using active patterns fill [3lll and staged 
evaluation. 

4. And last but not the least, the ideas described in this paper are 
implemented in our DSL-centric generic programming frame- 
work with first-class staging. We show in our evaluation that 
a programming framework with first-class isomorphic special- 
ization gives rise to a new paradigm and design pattern for de- 
velopment of both compiled DSLs and, more broadly, domain- 
specific software components without abstraction penalty. 

The paper is structured as follows. Section [2] gives an informal 
yet precise description of our approach with a motivating exam- 
ple from linear algebra. Section[3]formally describes our language, 
staged evaluation and isomorphic specialization. In Section [4] we 
evaluate our approach by comparing performance of various spe- 
cializations. In Section [5] we compare our approach with related 
work and in Section[6]we conclude. 

2. First-class Isomorphic Specialization in a 
Nutshell 

In this section we demonstrate the essence of isomorphic special- 
ization using a simple example. We consider the matrix- vector mul- 
tiplication (mvm) problem as our working example, which is shown 
in FigureQ] We use a subset of the Scala language to express nec- 
essary abstract types and their various implementations. 

trait Vec[T] { 
def length: Int 

def dotProductOec: Vec [T] ) : T 

} 

trait Matr[T] { 

def rows: Array [Vec [T] ] 

} 

def mvm(m: Matr[T], v: Vec[T]): Vec[T] = { 
val rs = m.rows // array of rows 
rs map { r => r . dotProduct (v) > 

> 



Figure 1. Abstract matrix and vector, trait in Scala is similar to 
interface in Java 

Imagine an object-oriented framework where the mvm algo- 
rithm can be expressed using interfaces of abstract data types like 
Matr [T] and Vec [T] . Then mvm can be executed using some con- 
crete classes implementing these interfaces. These implementations 
use different data structures for the in-memory representation of the 
data. Suppose the following: first, for each abstract type we have 
several implementations which have different performance char- 
acteristics (e.g. depending on sparseness of our data); second, we 
want to perform dynamic selection of the best representation based 
on the input data; and third, we are required to constrain ourselves 
to using only a certain core language. This could be the interme- 
diate language of some virtual machine (e.g. Java Virtual Machine 
byte-code), or we might have other reasons to limit the capabilities 
of the core language. 

This is a rather standard situation, with well known drawbacks 
and advantages. Among the drawbacks is the overhead imposed by 
interface method invocations. Among the advantages are modular- 
ity via encapsulation, flexibility to add new concrete implementa- 
tions and ability to make dynamic runtime choices. 

In the isomorphic specialization framework we can completely 
eliminate the overhead of method invocations while preserving the 
benefits mentioned above and thus fulfilling the abstraction with- 
out regret promise. Object-oriented code can be transformed into 
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the core language with a limited set of types and primitive opera- 
tions. In this paper, the core language is a higher-order functional 
language with pairs, sums, arrays and primitives shown in Figure|4] 
In Section [4] we show how to combine isomorphic specialization 
into the core language with subsequent optimized compilation of 
array operations of the core language using the LMS framework. 

It is important to understand that staged evaluation and thus 
specialization both happen at runtime. Thus, transformation into 
the core language may depend on sparseness analysis so that we 
can dynamically select the best implementation for all the abstract 
data types (interfaces) used in mvm. Then we can specialize mvm 
with respect to the selected implementation and thus produce the 
optimal specialized version of mvm. 

Now coming back to our example, we define two abstract data 
types: matrices and vectors. They are represented by interfaces 
Vec [T] and Matr[T] (see Figure [TP1 respectively, which contain 
all the operations necessary to implement the algorithms we are 
interested in. These interfaces are, in fact, our abstractions which 
are built above the core language: it is the user's design choice 
how to call them and which properties and methods they have. In 
this case the interface for vectors allows us to obtain the vector's 
length and to calculate dot product with another vector. The matrix 
interface allows us to retrieve rows of the matrix as an array of 
vectors. 

Array [T] is a core type (type from the core language). In our 
implementation it is a plain Scala (or Java) array of values of type 
T.0 

Given these abstract types, we are able to implement mvm as 
shown in FigureQ] 

This code operates with a mixture of abstract types and core 
types (like Array). In order to actually execute this code we need 
to implement the abstract types selecting some concrete represen- 
tations for matrices and vectors. For this purpose, we assume that 
each interface is implemented by several concrete classes as shown 
in Figure[2] 

class DenseVec [T] (val arr : Array [T] ) extends Vec[T] { 
def length = arr. length 

def dotProduct (vec: Vec [T] ) = vec match { 
case dv: DenseVec [T] =S> sum(arr 1*1 dv.arr) 
case sv: SparseVec[T] =^ 

sum(sv . values 1*1 Carr Csv . indices) ) ) 

} 

} 

class SparseVec [T] C 

val indices: Array [Int], val values: Array [T], 
val length: Int) extends Vec [T] { 
def dotProduct (vec: Vec [T] ) = vec match { 
case dv: DenseVec [T] dv . dotProduct (this) 
case sv: SparseVec [T] 

dotProductSVCindices , values, sv. indices, sv. values) 

} 

} 

class DenseMatrET] Cval rows: Array [DenseVec [T] ] ) extends Matr[T] 
class SparseMatrET] (vat rows: Array [SparseVec [T] ] ) extends Matr[T] 



Figure 2. Dense and sparse implementation of vector and matrix 
types 

Up to this point everything looks like the traditional object- 
oriented approach for designing abstractions and various imple- 
mentations. And this is our design choice. We are extending our 
functional core language with a limited set of object-oriented ab- 
straction mechanisms (such as interfaces, classes and methods). 

2 Of course, in practice they contain more methods. The figure shows only 
the methods used for mvm. 

3 In our implementation we use a covariant wrapper around Array which is 
invariant in Scala and a Rep type constructor similar to LMS. These details 
are out of scope of this paper. 



This is a programmer's interface and point of view into our frame- 
work. In our prototype, a programmer can just use some subset of 
Scala, where classes and functions naturally coexist. This serves as 
perfect input for all subsequent processing, which we are going to 
discuss below. 

Once we have the concrete implementations of abstract types 
like DenseMatr, we can associate them with some core types. This 
association or mapping is defined by means of special objects, 
so called isomorphisms (or isos for short). For simplicity we just 
assume that each iso is a class that implements the interface shown 
in Figure[3] 

trait Iso [From, To] { 
def to(x: From) : To 
def fromCy: To): From 

> 



Figure 3. Interface of isomorphisms 

As we will see later, isos can be automatically generated based 
on fields of concrete classes (e.g. DenseMatrix has one field 
rows), but this is just our convention to simplify the presentation. 
More sophisticated mechanisms can be used as well, e.g. using 
annotations in source code. 

These iso-functions define transformations between values of 
types. Usually isos are defined in such a way that it is possible for 
each concrete class C to build a composition of isos which relate 
C with some core type r. 

What is more important, isomorphisms can also compose dur- 
ing staged evaluation and this composition can also be made depen- 
dent on a dynamic choice, for example based on sparseness anal- 
ysis. First-class Iso objects might be stored in a staging-time data 
structure or passed around in the program in other nontrivial ways. 
The program might even read a configuration file at specialization 
time and pick either IsoA or IsoB based on its contents. That's 
something that plain inlining and rewriting could never achieve. 

The intuition for isomorphisms here is that every time you 
define a concrete implementation of some abstract data type you 
at the same time explicitly define an isomorphic representation 
of instances of that type in the core language. Every time you 
are growing abstraction over the core language by introducing an 
abstract data types, you are defining a way back in all concrete 
implementations. 

Our method of code specialization works in this setup and is 
enabled by such definitions of isomorphisms. It allows to auto- 
matically specialize invocations of mvm code into a code which is 
specific for the particular matrix object which happens to be an 
argument of the invocation at the time of staged evaluation (i.e. 
at runtime). Moreover, all the calls to the methods like rows and 
dotProduct inside the body of mvm are also points of dynamic 
choice: they are evaluated as virtual method calls, but at staging 
time (where the values are expressions), the result is inlining of 
the body of the method which is selected dynamically based on 
the actual types of the objects (that is how evaluation semantics of 
method calls is defined). 

Now, let's look at how the method of isomorphic specialization 
will work for at least two different concrete implementations of 
the abstract type Vec [T] . For demonstration we selected dense and 
sparse representations. 

Each concrete implementation of Vec [T] should implement 
its own versions of all abstract methods of the interface. In our 
example, these implementations are shown in Figure[2] 

Dense vector is represented by the concrete class DenseVec [T] . 
It is mapped to the core types via iso instance generated from the 
constructor arguments. DenseVec is represented in the core lan- 
guage simply as an array of values of type T. 
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Sparse vector is represented by the class SparseVec [T] . In the 
core language it is represented by the vector length and a pair of 
arrays: indices which contains indices of non-zero elements and 
values which contains the corresponding values. 

Dense matrix is represented by concrete class DenseMatr [T] . 
In the core language it is represented as an array of dense vectors 
of type T. 

Sparse matrix is represented by concrete class SparseMatr [T] . 
In the core language it is represented as an array of sparse vectors. 

class Array [T] { 
def length: lilt 

def applydndex: Int) : T // get element at index 

def applyCindices : Array [Int] ) : ArrayCT] 

def map [R] (f : T R) : Array [R] 

def filterCp: T =4- Boolean) : ArrayCT] 

def zip [U] (other: Array [U] ) : Array C(T,U)] 

def 1*1 (other: ArrayCT]): ArrayCT] // element-wise op 

y 

def range(start: Int, len: Int): ArrayCInt] 
def sumCTKarr: ArrayCT]): T 

def unzip [T,U] (pairs: Array [ (T,U) ]) : (Array CT] , Array CU] ) 
def dotProductSVCT] ( 

indicesl: ArrayCInt], valuesl: ArrayCT], 

indices2: ArrayCInt], values2: ArrayCT]): T 



Figure 4. Core language primitives 

In these implementations we use the core language primitives 
shown in Figure [4] 

Corresponding isomorphisms for all these concrete implemen- 
tations are presented in Figure|5] Note that these isomorphisms can 
be automatically generated for each concrete class, which we do in 
our implementation (see section POV 

type DVDataCT] = ArrayCT] 

class DVIsoCT] extends Iso [DVDataCT] , DenseVecCT]] { 
def tod: DVDataCT]) = new DenseVec(x) 
def fromCdv: DenseVecCT]) = dv.arr 

} 

type SVDataCT] = (Array [Int] , (Array CT] , Int)) 
class SVIso[T] extends Iso [SVDataCT] , SparseVec [T] ] { 
def to(x: SVData[T]) = 

new SparseVec (x ._l,x._2._l,x._2._2) 
def fromCsv: SparseVec CT] ) = 

(sv . indices , (sv . values , sv . length) ) 

} 

type DMDataCT] = Array [DenseVec CT] ] 

class DMIsoCT] extends Iso [DMData[T] , DenseMatr [T] ] { 
def to(x: DMData [T] ) = new DenseMatr (x) 
def from (dm: DenseMatr CT] ) = dm. rows 

} 

type SMDataCT] = Array [SparseVec CT] ] 

class SMIso[T] extends Iso [SMDataCT] , SparseMatr [T]] { 
def to(x: SMDataCT]) = new SparseMatr (x) 
def fromCsm: SparseMatr CT] ) = sm.rows 

> 



Figure 5. Isomorphisms for matrices and vectors 

Now we can illustrate how isomorphic specialization works. 

In order to call the mvm function with concrete implementation 
of matrices and vectors we need to: 1) wrap input core data in 
concrete objects; 2) call mvm with created objects and 3) extract 
resulting data. 

This three-step process is important. It doesn't matter how many 
objects we will create from the core data and in how many functions 
we use them. As long as we extract all the data from objects 
we can be sure that isomorphic specialization will specialize all 
method invocations (like rows and dotProduct) with respect to 
the concrete implementations that are used. 

In order to simplify our further description, let's limit ourselves 
to dense vectors and create wrapper functions that just do this three- 
step process. The code is shown in Figure|6] 



def dmdvm(m: Array [Array CT] ] , v: Array [T] ) : Array [T] = { 
val dm = new DenseMatr (m. map (r ^ new DenseVec (r) ) ) 
val dv = new DenseVec (v) 
val res = mvm(dm, dv) 

res.arr // extract data values from Vec 

} 

def smdvm(m: Array [ (Array [Int] , (Array CT] , Int) )] , 
v: ArrayCT]): ArrayCT] = -C 
val rs = m .map( (is , (vs , 1) ) =^ new SparseVec (is ,vs , 1) ) 
val sm = new SparseMatr (rs , v. length) 
val dv = new DenseVec (v) 
val res = mvm(sm, dv) 
res . arr 

} 



Figure 6. Wrapper functions 

def dmdvm_spec(m: Array CArray CT] ] , 

v : Array CT] ) : Array CT] = 
m map { row sum (row 1*1 v) > 

def smdvm_spec( 

m: Array C (Array CInt] , (ArrayCT], Int))], 
v : Array CT] ) : Array CT] = 
m map { r 

val indices = r._l 
val values = r._2._l 
sum(values 1*1 v(indices)) 

} 



Figure 7. Result of isomorphic specialization 

Now, if we apply isomorphic specialization to these wrappers it 
will generate the functions shown in Figure [7] which contain only 
the core language primitives that are used in a way that is specific 
to a concrete implementation of the abstract data types. For this 
particular example mvm, invocations in wrappers will be inlined in 
the wrapper body. This will bring in other invocations like rows and 
dotProduct which will also be inlined. All the inlining happens 
in dynamic fashion, following the evaluation semantics of virtual 
method calls. 

Note how fragments of code from concrete implementations 
are mixed in the resulting specialized versions. Staged evaluation 
implements a dynamic dispatch mechanism of method calls. 

Isomorphisms as first-class objects are subject to staged eval- 
uation. They can also compose to form new isomorphisms. These 
are the two factors and the key to the formulation of lifting of isos 
which is described in Section [3~5l 



3. Formalization 

In this section we describe two languages. The first one is the target 
for our specialization procedure (we call it the core language). The 
second one (called FJ) is an extension of the core language with 
a very limited set of object-oriented constructs necessary to illus- 
trate our algorithms and approach. FJ is inspired by Featherweight 
Java 1 16] and we try to keep similarity with their formulations. 

We describe our proposed technique, which we call Staged 
Evaluation, for transforming FJ terms into a DAG-based interme- 
diate representation. 

We also describe an algorithm which transforms FJ programs 
into equivalent (with respect to user-defined isomorphisms between 
FJ types and core types) core language programs. 

We use overline x and indexed expression as a short- 

hand notation for the list (x\, . . . ,x n ). We also allow these lists to 
be empty when n = 0 and often omit i = 1 part and assume it by 
default. Index i in this case is always bound by this notation. 
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3.1 Core Language 

Figure [8] summarizes the syntax of our core language. It is an 
explicitly typed lambda calculus enriched with pairs, sums, arrays 
and pattern matching case expressions. 



r 3 t 

Term 9 e 

h 
5 

P 

v 



Unit | Int 

(ti X t 2 ) 
(n + r 2 ) 
(n -> r 2 ) 
Array [r] 
i | ie : t 

A(x : r).e | eie 2 
e 

case e of { pi — > } 

0 I (-,-) 

1[tl, t 2 ] ' - I r[Ti, r 2 ] 
f st _ | snd _ 
_®__ 

1 | fc x 

i | A; i7 | A(x : r).e 



base types 
binary product type 
sum type 
function type 
array type 

integer literals and vars 

functions 

constructors 

primitives 

case analysis 

unit and pair 

injections 

projections 

binary operations 

patterns 

values 



Figure 8. Syntax of the core language 

We assign types to the terms in a standard way following typing 
judgments shown in Figure|9] 

Although type annotations play an important role in staged 
evaluation we don't describe any sort of type checking to ensure 
that functions are applied to arguments of appropriate types. In 
other words we consider only well-typed terms. 



r, x : t h x : t r h I : Int r h () : Unit 

® : (n X r 2 ) -> t 3 T h ei : ri T h e 2 : T 2 
T h ei © e 2 : r 3 

r h e : ti X r 2 T h e : ri X r 2 r h ei : ti T h e 2 : t 2 
r h f st e : ti r h snd e : r 2 r h (ei, e 2 ) : ti X t 2 

r h e : ti r h e : r 2 

r h l[ri , t 2 ] ■ e : ti + t 2 r h r [ti , t 2 ] ■ e : n + t 2 

r h e : ti + t 2 r, ii : ri h ei : 7 T, a; 2 : t 2 h e 2 : t 
r h case e of {1[ti , t 2 ] • — > ei ; r[ri , t 2 ] ■ X2 — > e 2 } : t 

r h e : Int f h c, : t 



T h case e of {£j — > e^} : t 

r, x : ti h e : t 2 r h ei : t 2 — > t T h e 2 : t 2 



r h A(z : Ti).e : rj — > t 2 



r h ei e 2 : t 



Figure 9. Typing judgments of the core language 

Note that each well-typed term has exactly one type (or later, a 
single most-specific type). This will allow us to reify types as part 
of terms. 

Call-by- value reduction contexts 

£ ::= \3 \ kv £e \ Sv £e \ £ e\ (Xx.e)£ 
case £ of { pi —¥ ei } 



Call-by-value evaluation relation 

[(Xx.e) v]£ i 
[case k v of { hi ~x~i ei }]£ i 
[case / of { li — ¥ ei }]£ i 

[5v]£ i 



[\vjx\e}£ 
\[v / Xj\ej\£ , if k ■ 
[ej]£, if I = h 
[l]£,i£l = I5fv 



(1) 

(2) 
(3) 
(4) 



The standard call-by-value evaluation semantics of the core 
language, which is shown in Figure[lO] doesn't use reified types. 

3.2 FJ Language 

Figure QT] summarizes the extensions to the core language with 
additional object oriented constructs. We use Scala-like syntax for 
fields and methods. We denote types from the core language with 
r and we use a to denote interfaces and classes. We will refer to 
classes and interfaces as object types and to the types constructed 
from them as FJ types. 



T 


:= --.{a 


extended types 


a 


:= I_\ C 


object types 


V 


:= cd e 


program 


cd 




declaration 




trait / {ins} 


interface 




class C extends I{fd md} 


class 


ms 


:= def m(x : a) : a 


method signature 


fd 


:= val / : a 


field 


md 


:= def m(x : a) : a = e 


method definition 


e 




extended expressions 




] e.f 


field selection 




e.m(e) 


method invocation 




new C(e) 


instance 


V 




extended values 




new C(v) 


objects 



Figure 10. Evaluation semantics of the core language 



Figure 11. FJ language syntax. C, f, m are class, field, and 
method names respectively 

We assume that a correct program induces a number of utility 
functions that we will use in the typing rules. First, we assume 
the function fields(a) returns a sequence val / : a pairing a field 
of a class or interface with its type, for all the fields declared in 
type a. Second, we assume the partial function ftype, which is a 
map from an FJ type and a field name to a type. Thus ftype(a, /) 
returns the type of the field / in the class or interface a. Third, 
we assume a partial function mtype that is a map from an object 
type and a method name to a type signature. For example, we write 
mtype(C, m) =a — ► (j> when class C contains a method m with 
formal parameters of type a and return type cj>. Similarly, the body 
of the method m of the object type a, written mbody(a, m), is a 
pair (x, e) of a sequence of parameters x and an expression e. 

Furthermore, we assume that for each interface I there exists at 
least one class C which implements /. A field f of a class C can 
implement the (argument-less) method f 0 in the interface /. 

There are no assignments, inheritance, super calls, object iden- 
tity, exceptions, or access control in FJ. Each class has exactly 
one constructor which takes all the fields as arguments, in the or- 
der specified in the class declaration. There are no statements and 
method body is simply an expression. We use integers, arithmetic 
operations, case pattern matching instead of conditional expres- 
sions. All references to this are explicit. Overloaded method ref- 
erences are resolved statically by including argument types as part 
of the method name. FJ permits recursive class dependencies with 
the full generality of Java. A class can refer to types and call con- 
structors of any other class. But note that recursion in methods is 
not supported. 

The typing and semantics extensions for FJ with respect to 
the core language are given by extended evaluation contexts, two 
additional primitive reduction rules and three typing judgments 
shown in Figure [T2] The FJ type system is sound and decidable. 
Please see (lal for further details. 

3.3 Isomorphisms 

For each class an isomorphic representation is defined based on 
its fields. Given a class C with the fields {val ft : Oi} n , the 
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Subtyping 



C <: C 



C <: D D <: E class C extends /{. . .} 



C <: E 



C <:I 



Expression typing 



r h e 0 : C 0 fields(C Q ) = val / : C 
r h e 0 .fi : Ci 

r h e 0 : Co mtype{m, Co) = D->- C Fi-e:E E <:D 
r h en.m(e) : C 



fields(C) = va\f:D F\-e:E E <: D 
r h new C(e) : C 

Call-by-value evaluation contexts 

£ ::= ■■■ 

£.f | £.m(e) \ v.m(v £e) \ newC'(v£e) 

Call-by-value evaluation relation 



[new C(v).fi]£_ 
[new C(v).m(u)]£ 



[vi]£, if fields(C) = val / : C 
[[new C(tT)/this; u/x]eo]£, 
if mbody(m, C) = (x, eo) 



(5) 
(6) 



Figure 12. FJ typing and evaluation semantics 
function reptype(C) returns the type Unit if n = 0, o\ if n = 1, 
(ci, (. . . , (cr„_i, (7 n ))) otherwise. For example 

reptj/pe(SparseVec [T] ) = (Array [Int] , (Array [T] ,Int)). 

For each FJ class C there exists a special class Isoc which im- 
plements the interface Iso[reptype(C) , C] (see Figure [3}. Note 
that Iso is not a generic (polymorphic) interface as FJ doesn't sup- 
port generics. Rather, it is a template which produces an interface 
when instantiated with type parameters. 

Thus, given any class C, the function iso(C) returns an instance 
of Isoc- This instance represents an isomorphism between a core 
type t and the FJ class C.Q 

The isomorphisms returned by iso are called primary. They 
can be composed to form other composite isomorphisms. This is 
discussed in Section [3~5l 

Now suppose that we have the following definitions 

class Ci extends I{. . .}; class C-i extends /{. . .} 

We will refer to the interface I as an abstract type and imple- 
menting classes Ci and C2 as concrete implementations or con- 
crete classes of this abstract type. By using this abstract vs. con- 
crete terminology we will always implicitly assume this connection 
with classes, interfaces and isomorphism instances defined above. 

For any abstract type 7, every concrete implementation of / 
defines both an alternative representation of instance data and 
concrete implementation of the methods declared in I. 

We assume that all abstract types are closed, that is, all their 
concrete implementations are known in advance. Our implementa- 
tion doesn't have this limitation but we assume it here to simplify 
discussion. 

If we want all concrete representations of an abstract type / to 
be inter-convertible, we need to impose an additional convertibility 



4 Strictly speaking, Isoc[t, C] defines an isomorphism not between r and 
C (for example not every (Array [Int] , (Array [T] , Int) ) corresponds 
to a SparseVector [T] ) but between some subset T C r and C. This in 
particular means that isos are postulated and cannot be inferred. We allow 
ourselves to be a bit sloppy and implicitly assume such subset T in further 
discussion. 



requirement on concrete classes. Namely, all of their fields must 
implement some property-method in /. E.g. Vec [T] will have to 
include methods arr (from DenseVec [T] ), indices and values 
(from SparseVec [T] ). But this is not required for isomorphic 
specialization to work. 

3.4 Staged Evaluation 

We have already mentioned that staged evaluation can be formal- 
ized as zipper-based traversal of program terms where one-hole 
contexts are generically constructed from evaluation reduction con- 
texts of the language. We refer an interested reader to related pub- 
lications fTlfTai . 

This section describes a staged evaluation algorithm for FJ. 
We represent programs for staged evaluation as finite mappings 
{a — > node} from identifiers (or addresses) to nodes where the 
set of nodes is defined by the grammar given in Figure [T3FI Every 
address referenced by a node must itself be mapped to a node. 
We consider this mapping as a graph where the incoming edges of 
each node are given by the addresses it contains. This graph must 
be acyclic (i.e. recursive definitions aren't allowed). In this paper 
we use DAG (directed acyclic graph), graph and program graph as 
synonyms. 

Node := x variable 

d a definition node 

\a./3 function 
case 7 of {pi — > Pi } pattern matching 

I constant 

k constructor 

<5 primitive 

new C class constructor 

I . m interface method 

C . m class method 

as in the core language 

app function application 

mcall method call 

mdef method definition 

p := I I ka case patterns 

Figure 13. DAG nodes 

We assume K, is a set of constructors, £ is a set of constants, V 
is a set of term variables, V is a set of primitives, CCS is a set of 
classes and C ranges over CCS, ABS is a set of abstract types and 
/ ranges over ABS. 

A ranges over the set Dag of all DAGs, Greek lower case let- 
ters a, P, 7 and v range over addresses in a given DAG, node (or 
n for short) ranges over nodes. I.m ranges over interface meth- 
ods, Cm ranges over class methods (as well as fields, which are 
treated as zero-argument methods), and d ranges over the set T> = 
K,UCUVUCCSU{I .m}u{C .m}. The primitive app{a, 13) denotes 
application of function referenced by a to f3. mcall(~{, u,a) de- 
notes invocations of the method referenced by the address fi on the 
instance 7 with arguments a. mdef '(Cm, (f>) denotes the method 
definition for C . m, where (f> is the lambda-abstraction for the body 
of the method. 

Terms are defined as in Figure [8] except variables are replaced 
by addresses, so that nodes are basically terms of depth 0 and 1 
(except for method calls and definitions, as described above). 

We call a pair of a DAG A and an address a in its domain a 
marked DAG and denote it by A (a), a serves as a pointer into A 
and is called the marked address. MDag is the set of all marked 
DAGs. 

Pattern-matching function <\J) can be used to extract nodes from 
DAGs and we write patterns A { (patter n[)} to bind the DAG with 
A and the node at marked address with pattern. To simplify 



' For d = I, I. m or C . mo in da is always empty. 
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handling of nested patterns, A{(](a, (/?, 7))D) is syntactic sugar for 
A{d( a i d(/3, 7)[))[))- We also use patterns as predicates. 

We define binding scope as the set of nodes which depend on 
variables introduced by lambda-expressions and case-expressions. 
It will be convenient to represent it by a term called the scope body 
and calculated by the function scope defined in Figure [T4l 



scope : Addr X MDag 
scope(a, A(/3)) 



■ Term 
i — y scope' (a, 0, A(/3>) 



scope 1 : Addr X Addr X MDag — > Term 
scope' {a, (3, A(7)) 

if 7 6 free (5, f3) i— > 7 
scope' {.,j3, A(a@(i|)) 

scope' (cE,/3,A<H7i)l)) 
scope' AdJAa'ol)) 
scope' (a, (3, A{(\nodety)) 
if node = case 7 of { 
fc;Qi -> 7i 



} 



d( S cope'(a,^,A(7 l »)" 
Aa'.scope(a', A{7)) 

case scope' '(a, (3, A(-y)) of { 
fci«7 -> scope(a7, A(7i)) 
} 



Free addresses of a binding scope 

/ree(5, A(/3>) ' h> {f| 3 7 e S 7^i/}\S 

where 5 = {/?} U (f/p n Down) 

Up = {7 1/3^7} 



Down 



{7 I 3i 7-0 



Dependency relation (—0 denotes its transitive closure) 
Aa = d[^) 

A7 = Act./3 e free(a, A</3)) 

7 — o !^ 

A7 = case 70 of {pi cTj -> ft}" 
7 — 0 70 

A7 = case 70 of fa a7 -> ft}" 1/ g free(a], A{/3j)) 



Bound vars 



J —o V 



Vj<> 



ars 




9 Scope 
ijij- Up nodes 
Down nodes 



Figure 14. Scope body and auxiliary definitions 

We consider two lambda nodes to be a-equivalent when there 
exists an address substitution which makes their scope bodies 
equal. This is formalized by the following definition: 

Definition 1 (a-equivalence). VA, 71, 72, if^)i = Xai.fii, A72 = 
Ao2./32 and scope(aT, A{/3i)) = [aT/o2]scope(a2, A{/32)) 
then Xcti.fli = Aa^.^- 



Our usage of DAGs instead of trees for defining programs was 
originally motivated by the fact that sharing and many other opti- 
mizations are better achieved using graphs. 

But specifically for this paper the key point is that DAGs (em- 
powered by active patterns) allow us to express rewriting rules im- 
plementing isomorphic specialization locally instead of having to 
look arbitrarily deeply into the tree. 

This works in concert with the staged evaluation algorithm 
where the DAGs are constructed from inputs to output in a breadth- 
first way while the front is kept in evaluation stack. Thus, the 
structure of the resulting DAG is unknown until the graph is fully 
constructed. This process is hard to describe using just terms with 
let-bindings. 

The staged evaluation algorithm is defined by two functions: in- 
jection and staged evaluation, which are parametrized by an addi- 
tional function RW. By default RW is identity, otherwise it spec- 
ifies some rewriting rules that are applied while building the graph 
(and may be mutually recursive with injection and staged evalua- 
tion). We will see an example of non-identity RW in Section [375l 

The injection function A <— n defined in Figure[T5]adds a node 
n to a DAG A. It returns A' (a) where a is the address of n. This 
is a collapsing injection 1 26] which means that every different term 
is represented by a unique sub-DAG. In particular, when A already 
contains a node equivalent to n, A' = A and a £ T>om A. 



A 
A 
A 



Dag X Node -s> MDag 
k> A< 7 > 
n> (A U {7 1 



A <- AE7./3 
A <— node 

AddNode(A, n) 



2}) (7) where 7 is fresh in A 
^^(A^)) if 3p : Au = mdef (C.m,_) 
AddNode(A' ,mde/(C.m, (f>)) where 
(x, e) = mbody(C, m) 
Ao(c«o) = A <— this 

(Ai( ai ) = Aj_! <r- Xi)f =1 

13 = S£[[a 0 /this;a/x]el A n 

A'<0) = A n «- A{a}? =0 ./3 

W(A(7» if 37 : A7 = AE7./3 

AddNode(A, Xa.f3) otherwise 

RW(A{a)) if 3a : Act = node 

AddNode(A, node) otherwise 

W((AU{7^n})<7)) 
where 7 is fresh in A 



Figure 15. Injection function 

The staged evaluation function 5i5|[eJ A defined in Figure [TBI 
takes a term e and a DAG A and returns a marked DAG A' (a) 
where a corresponds to the value of the term e. 

The key intuition for staged evaluation is that it behaves sim- 
ilarly to the evaluation relation defined in Figures [Tol and [T2l but 
with values in the evaluation contexts replaced by addresses, and 
with a stack of contexts instead of just one. Any address a pro- 
duced by injection is considered a partially evaluated value of the 
term, and SE' is recursively applied to that value until the context 
stack becomes empty and the evaluation finishes (this is the last 
case in Figure fT6l. 

In other words, staged evaluation uses and calculates new ad- 
dresses in the same way as standard evaluation uses and calculates 
new values. 

Note also that when we use staged evaluation for a method def- 
inition or for a lambda we create fresh addresses for the arguments 
and substitute them in the body. This reflects the call-by-value se- 
mantics of FJ. 

The last but not the least piece of the puzzle is to explain what 
SE algorithm is doing with respect to the FJ language and its eval- 
uation semantics. Let's write the algebraic type of FJ expressions 
(defined in FigureslHlandll It as a regular recursive data type using 
the notation of 1: 1911 . 
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SEU 

sem 



: Term - 
A 



Dag ->• MDag 



5-B'[-] : Term -4 

SF'IdeoeJ 72 A 
SF'[[case e of {pi 
SE'la e 2 ]]72 A 
SE'lXx.ej 72 A 



5iacfc c9F -> Dag -> MDag 



SF'[e.m(e)][72 A 

SE'H (d^ne 0 el :: 72) A 

SF'H (d/8D ::72) A 

S-E'JaJ (case □ of {p^ 5J7 — > e^} 



S£'H (□ e2 72) A 
SE'H ((3D:: 72) A 

: tr] (□.m() :: 72) A 



SF'H (n.m(e 0 e) :: 72) A 
SE'H (0.m(^Ue o e) :: 72) A 
SE'H (08 : cr).m(TO) :: 72.) A 



SE'MeA 



h> SE'H 72 A' where A' (a) = A <- i 
H> 5£'ie 0 ](dne::72)A 
i — y SE'le} (case □ of {p t -> e,} :: 72) A 
h> SF/fei] (□ e 2 72) A 
i — ^ SE'l-yj 72 A 3 where 
Ai(a) = A -f- x 
A 2 </3) = SE'l[a/x)e} e Ai 
A 3 (7) = A 2 <- Aa.^ 
h» S-E'fle] (□.m(e) 72) A 
i ^ 5£'|eo] (d /3aDeT :: 72) A 
i — v SE'\a'\ 72 A' where A' (a') = A <— d j3a 
72) A 0 H> SF'[a'] 72 A" where 

(Ai(7") = A^i «- x-) n ; A' 0 = A n 
(A<(ft) =SE'[[^/5-]e i ]eA^ 1 )" 
A" (a') = A; <- case « of { Pi 77^ ft} 
>->• SF'[Ie 2 ]](an::72)A 

i-> SE'tfj 11 A' where A' {7) = A «- app(/9, a) 
h» SE'M 72 A 2 where 

Ai(/i) = A <— (T.m 

A 2 {i/) = Ai <— rncall(a, fi, []) 
1 ^ 5B'[eo] (a.m(De) :: 72) A 
^ 5S'ie 0 ] (/3.m(7aDe) :: 72) A 
1 y SE'M 72 A 2 where 

Ai(/i) = A <— cr.m 

Aa(f) = Ai •<-mcaiJ09,A»,7++[a]) 
i-> A (a) 



Figure 16. Staged Evaluation (call-by- value). 7£ stands for a stack of evaluation contexts, e for the empty stack and {£ 
£ on top and 1Z underneath, d stands for either k or 8 in Terra. Fields are treated as zero-argument methods. 



TV) for a stack with 



E = fte.(C x e° + V x e° 

+ Kxe"+?xe" 

+ X(x : r).ei x e° + e 2 

+ D.f x e 

+ (I .m + Cm) X e X e" 

+ new C x e n 

+ case □ of {pi -> x e) 

Now by formally differentiating right side as a function of 
variable e we get 



dE = fie.{ 

Fin n X K X e n_1 + Fin n X V X e n_1 
+ Fin 2 X e 

+ n.f 

+ Fin (n + 1) X (l.m + Cm) X e n 

+ Fin n X new C X e n_1 

+ case □ of {pi — > ei}) 

Here Fin n is a type which contains exactly n different 
values, every value of this type can represent an index i € 
{1 . . . n}. Thus a value of Fin n x e n ~ 1 can be represented as 
ei . . . ei_ide;+i . . . e n i.e. a list of length n with a hole, and using 
our overline notation as ede. 

Now if we replace the holes with letter £ and additionally re- 
quire expressions before £ to be values (i.e. addresses) we get ex- 
actly the data type of the evaluation contexts of FJ, which accord- 
ing to Danvy | 9] is isomorphic to the data type of defunctionalized 
continuations of an evaluation function of the language. 

At the same time dE is a type of one-hole contexts for terms 
of the language, which we use to represent a state of SE algo- 
rithm (see context patterns in FigurefTrSt. Note that SE moves the 
hole forward in the list by replacing terms with evaluated values 
(addresses). 



This observation makes it clear that the stack of zipper contexts 
corresponds to the defunctionalized continuation and represents the 
work which remains to be done by SE' while evaluating the termQ 

Finally, it turns out that applying staged evaluation to a scope 
body returns the same address it was extracted from. Formally: 

Proposition 1. Let RW be identity, then VA,a 6 V, /3 € 
Vom A SElscope(a,A{P))j A = A(/3) 

Proof (sketch). By induction on the structure of scope(a, A(/3}). 

□ 

3.5 Isomorphic Specialization 

The generic nature of staged evaluation leads to a generic formu- 
lation of the isomorphic specialization transformation. The idea 
is to use the ability of SE to handle terms with DAG addresses 
as values, i.e. to evaluate applications (fi 7) a where /1, 7, a are 
addresses. This feature allows us to integrate local term rewriting 
rules in a process of global DAG construction. 

Rewriting rules transform a marked DAG A (7) into a new 
marked DAG. This means that each rewriting can change either the 
DAG or the address or both. Many new nodes can be added to the 
DAG as part of rewriting, but already existing nodes can't change. 
DAG is an immutable data structure. 

Function RW applies rewriting rules iteratively until reaching a 
fixed point where no more rewrites are possible. This is important 
because one rewriting often opens possibilities for another. Note 
that each new node is subject for rewriting after it is added to 
the DAG by the injection function. As a consequence, it can be 
forgotten (in which case it will not be a part of the final binding 
scope). The intuition is that the DAG A represents the universe (or 



6 Notably, this construction of zipper-style traversal doesn't work for call- 
by-name evaluation contexts because dE contains (\x.e)£ and k v £ e 
which aren't CBN contexts. 
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class Iso x lAx,A 2 ,Bi,B 2 ~}(. 

val isox : Iso[Ai,i?i], val iso 2 : Iso [A2 , B 2 1 ) 
extends Iso[Ai x A 2 ,B\ x S 2 ] { 

def to(a:Ai x A2) = (isox .to(f st(a)) , iso 2 .toCsnd(a))) 

def fromCbiBi X B 2 )= (_isoi .fromCfst(b)) , isc>2 . f rom(sndCb) ) ) 

} 

class Iso + LAi, A 2 , Bx, B 2 1 ( 

val zsoi : Iso[Ai,i?i], val iso 2 : Iso[A2,-B2]) 
extends Iso[Ai + A 2 , Bx + B 2 1 { 
def to(a:Ai + A2) = case a of { 

1-ai — > 1-isox .to(ai) ; r-a2 — > r-iso 2 .to(a 2 ) 

} 

def fromCb:Bi + B 2 ) = case b of { 

1 ■ 61 — > l'isox .from(6i) ; r-^2 — > r-is02 .from(&2) 

} 

} 

class Iso arr [A, B~\ (val iso: Iso[A,B]) 
extends Iso [Array [A] , Array J { 
def toCas: Array[A]) = as .map(iso . to) 
def fromCbs: Array[B]) = bs .mapCiso . f rom) 

> 

class Iso^ iAx, A 2 , Bx , B 2 1 ( 

val isox : Iso[Ai,I?i], val iso 2 : Iso[A2,-B2]) 
extends Iso[Ai -i.A2.-B1 -> B 2 1 { 
def to(f : Ai — > A 2 ) = b =>■ iso 2 .to(f (isoi .from(b))) 
def f romCg : Bi — s- B 2 )= a zsc>2 . f rom(g(isoi . to(a) ) ) 

} 



Figure 17. Compositions of isomorphisms 

sea) of nodes, the marked address points to some particular node 
and everything else happens relative to the marked address. 

RW works by pattern-matching the node corresponding to the 
marked address, and each case can be regarded as one rewrite rule. 
Each rule first extracts some subgraph using active patterns which 
we described in Section [3A\ The result is a term with addresses 
of the nodes as the values of the variables bound by the pattern. 
Thus we have a term on the left-hand-side with variables that can 
be used on the right-hand-side. So we can define some term on 
the right using those variables bound on the left and inject it back 
to the DAG using staged evaluation by handing the term to SE, 
which makes SE and RW mutually recursive. 

Thus, we can define graph transformation by specifying a set 
of term rewriting rules. Not every transformation can be defined in 
this way, but we claim that isomorphic specialization can be defined 
as just a set of specially selected rewrite rules in this framework. 
These rules are shown in Figure[T8l 

Let's look at these rules. The idea is to use a special constructor 
which we call view and denote as e<iso with the following typing 
rule: 

F h e : r, iso : Iso[t, a] 
T h e<liso : a 

The intuition is that each node of this view type represents an 
isomorphic connection between a value of the core type r and 
a value of the FJ type a. So, if we define rewriting rules that 
systematically move these views along the edges of the DAG, from 
bound variables towards the root of the binding scope, then the 
DAG, which remains after rewriting is complete, contains only 
the core language nodes and thus it represents a program in the 
core language. This resulting program is equivalent to the original 
program because every rewriting step preserves semantics. 

To define these rules we need to be able to compose primary 
isomorphisms and to create new isomorphisms associated with 
view nodes. We define one composite isomorphism for each type 
constructor of the core language, which allows us to lift views 
over types. Composite isomorphisms are shown in Figure [17] and 
the method of isomorphic specialization is implemented by the 
rewriting rules shown in Figure [Til 

Together these rules implement isomorphic specialization in 
such a way that the following property holds: if a closed FJ expres- 



sion has a core type then staged evaluation with specializing rewrit- 
ing rules will produce a core language expression which doesn't 
contain any FJ constructs or vestiges of the to/from functions of 
isomorphisms. This is formalized in the next conjecture: 

Conjecture 1. Let RW be RW sp ec, e £ TerrriFj such that e is 
closed and e : r for some r £ Thore- Then Be' : t,6 : Addr — > 
Addr such that e e Term Co re and SElej {} = e(SEfe'J {}) 
(where {} is the empty DAG). 

This holds even in the presence of multiple concrete implemen- 
tations of any abstract type. This is achieved by 1) applying sys- 
tematic rewriting during staged evaluation; 2) providing domain 
specific rules that lift view nodes towards the output of the DAG; 
and 3) applying staged evaluation to the first-class Iso instances. 

The idea is that to/from pairs are not just compiled away by ap- 
plying the identity rule. Instead, staged evaluation is applied to Iso 
instances themselves (as can be seen from the rules). Each from 
implementation accesses the properties of the concrete class. And 
this is where virtual method invocation semantics of staged evalua- 
tion takes place. It leads to inlining of the concrete implementation 
code selected at evaluation time (i.e. at runtime). 

We don't claim that the conjecture will hold for any pair of 
languages. Domain specificity is important here, as the rules are 
domain-specific. Rather, it is an important property of the core lan- 
guage to be friendly to isomorphisms and to serve as a target of 
isomorphic specialization. In our experiments we observed a mul- 
titude of such friendly languages, which supports our conjecture. 

4. Evaluation 

In a framework with a first-class isomorphic specialization we can 
automatically specialize a program written in terms of abstract 
data types with respect to any concrete representations of those 
abstract data types translating to a given core language. In our 
approach this looks like programming with classes and interfaces 
of object oriented-programming, with the difference that we are 
able to automatically eliminate all abstractions and accompanying 
overhead. 

In Section f2] we illustrated our approach using mvm example. 
We showed in Figure [7] two resulting specializations generated for 
dense and sparse representations of matrices. In both those cases 
the vector has dense representation. If we consider also sparse rep- 
resentation of vectors then we can generate another two specializa- 
tions of original mvm example two of which we show in Figure [191 

In this section we describe results of our experiments and ex- 
plain why we think isomorphic specialization is important. 

In practice, the choice of a particular representation of data 
depends on what kind of input data we have. For example, if 
input data is sparse then it would be reasonable to use sparse 
representation and if the data is dense then dense format would 
be more efficient. This is a common trade-off and typically, in 
order to make a justified choice, all representations should be tried 
out. This is exactly the case where our isomorphic specialization 
would be very useful. Because it is first-class you just need to 
implement all concrete representations of abstract types you are 
interested in and the framework will generate necessary specialized 
versions of your program for you. And importantly, you can think 
about each concrete implementation independently from the other 
implementations. You don't need to worry about how they will be 
mixed into generated specialized versions. This also means that 
if you invent a new representation you just need to implement it 
and add to the system. The system will generate new specialized 
variants automatically. 

To show how important it is to eliminate abstraction overhead 
we measure performance of all specialized versions of mvm and 
compare them with the original version without specialization and 
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RWspec : MDag — > MDag 
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Figure 18. Rewriting 

def dmsvm_spec(m: Array [Array [T] ] , 

v: (Array [Int] , (Array [T] , Int) )) : Array [T] = { 
val indices = v._l 
val values = v._2._l 

m.map { row sumCrow(indices) |*| values) > 

} 

def smsvm_spec(m: Array [ (Array [Int] , (Array [T] , Int) )] , 

v: (Array [Int], (Array [T] , Int) )) : Array [T] = { 
val indices = v._l 
val values = v._2._l 
m.map { (is,(vs,_)) 

dotProductSVds, vs, indices, values) 

} 

} 



Figure 19. Result of isomorphic specialization 

optimization of array operations. First, we applied isomorphic spe- 
cialization to wrapper functions. The DAGs of specialized versions 
(shown in Figures|7land|19t were injected into LMS to produce op- 
timized Scala code. Then for each experiment we did the following 
steps: 

1 . Randomly generate input matrix and vector data of desired size 
and sparseness (percentage of zero values). 

2. Convert the input vector and matrix into sparse and dense rep- 
resentations, as described in Section|2] 

3. Run all versions of generated Scala code. 

We used the Scalameter benchmarking library to measure ex- 
ecution time. It performs preliminary warm-up and then executes 
given code repeatedly in order to calculate the average execution 
time. The final times in milliseconds for all the experiments are 
given in Tableland TabltJU 

All matrices have the same size: 10 4 x 10 4 . Accordingly, the 
length of input vectors is also 10 4 . S m is matrix sparseness and S v 
is vector sparseness. The remaining columns show evaluation time 
for the original version and each specialized version. 

These results more or less reflect our intuition about perfor- 
mance of different representations. But not all was obvious in ad- 
vance: we see that when sparseness of both vectors and matrices 
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Table 1. Execution times of original versions of mvm 
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Table 2. Execution times of specialized and optimized versions of 
mvm 

is close to 50%, smdvm or dmsvm perform much better than dmdvm 
and smsvm (see line 3), and that smsvm is generally quite slow. 
Thus specialization for free really does help with selecting the best 
representation instead of merely confirming expected results. 

Our implementation is a Scala library which is rather small and 
generic. It uses first class type descriptions and works for any data 
types defined by the user in his/her application. 

The proposed isomorphic specialization cannot and should not 
replace other optimization techniques. In particular here we show 
how it works in concert with LMS. In terms of optimization, the 
main benefit comes from deforestation and loop fusion of array 
operations which are implemented in LMS. It turned out that for 
all specialized versions LMS was able to automatically generate 
Scala code eliminating unnecessary intermediate arrays and fusing 
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all the loops. If you were writing it manually you would write 
very similar code. This means that we can use LMS is an efficient 
implementation of our core functional language with arrays. All we 
have to do is to translate domain-specific abstractions to this core 
language. This separation of concerns works very well in practice, 
as it allows to build a solution from the best-in-class components. 

Thus, we have shown how to develop an object-oriented func- 
tional language in which you can create abstractions without wor- 
rying about their performance overhead. 

5. Related work 

This work is based on our previous attempts 1 28] to combine 
generic (aka polytypic) programming techniques and Lightweight 
Modular Staging 1 22] in the context of Scala language. That work 
was mostly focused on technical details of deep DSL embedding 
using some tricks of Scala language. The main idea is that by writ- 
ing programs using a polymorphic embedding style fiUl . they can 
be interpreted in at least two modes: evaluation and code gener- 
ation. In the evaluation mode programs are immediately executed 
using runtime of the host language (Scala). In the code generation 
mode the same code yields a graph-based intermediate represen- 
tation. This was the main motivation for explicit formulation of 
staged evaluation as it is presented in this paper. 

Our current implementation of staged evaluation and isomor- 
phic specialization is derived from Scalan 1 28] . We removed ev- 
erything related to NDP and generalized the Scalan library in such 
a way that NDP could be considered as an application. Similar to 
LMS we use Rep[T] based embedding in Scala, but we don't use 
Scala- virtualized |3] for such embedding. Instead we use dynamic 
proxies for embedding of user-defined types and to implement 
method invocation behavior of staged evaluation. And this is where 
the staged evaluation is different from LMS. First, we put global 
smart constructors of LMS into user-defined classes where they 
become just methods. Second, we allow class instances as DAG 
nodes. Third, the only implementation of Exp [T] during staging 
is Sym [T] . This makes Exp [T] instances always behave like typed 
references to DAG nodes, so that if e : Exp [Matr [Float] ] then 
semantically e is the address of a node of type Matr [Float] . An- 
other difference from LMS is that rewriting doesn't happen in smart 
constructors. Instead it is defined by a separate set of rules applied 
until fixed point is reached. We found this technical difference very 
convenient in practice as the staging mechanism is separated from 
rewriting. 

In spite of the differences, thanks to flexibility of Scala as the 
host language, we can combine Scalan and LMS into an end-to-end 
solution. After the final graph is created we create an instance of an 
LMS context and inject all the core language primitives produced 
by isomorphic specialization into it. Then the Scala code corre- 
sponding to this instance is generated, leveraging code generation 
and optimizations implemented in LMS. 

Sacher 1 26] is the main source of inspiration for the notation of 
marked DAGs and the collapsing injection. DAG rewriting based 
on pattern matching using extractors (or active patterns) is due 
to I22n . We found the combined notation of marked DAGs and 
active patterns very convenient for our formulation. 

The idea of not rebuilding already-present nodes can be traced 
back to Sassa and Goto [25], who described what is usually called 
hash consing. Kahrs 11711 showed how it can be used to implement 
fully-collapsed jungles. 

We describe the staged evaluation algorithm as a zipper-based 
traversal. This formulation is closely related to Danvy's research 1 9, 
[Toll on inter-deriving semantic artifacts. 

Collapsing injection of terms into DAGs that we perform as 
part of staged evaluation corresponds to the approach to common 
sub-expression elimination in Rompf 1 24] . Similarly dead code 



elimination is automatically achieved by respecting the dependency 
relation during calculation of binding scopes. 

We use pattern matching of DAGs in a way similar to a method 
described in |21]. But because we work in a purely functional 
context without effects our formulation is different and is based 
on dynamic recognition of binding scopes in DAGs and extracting 
them using active patterns. This greatly simplifies formulation of 
and reasoning about rewriting transformations. 

The object-oriented extension of the core language is inspired 
by Featherweight Java 1 16] but we adapted their formulation to our 
core language. 

Isomorphic representations of types have a long history in the 
generic programming community (see 11411 for an overview), but 
the main question is usually how to automatically generate isomor- 
phisms for user defined types. We, on the contrary, emphasize user- 
defined isomorphisms as a bridge between an abstraction and some 
concrete representation in the core language. Thus our approach is 
the opposite: we require the user to explicitly specify how he wants 
each concrete implementation to be represented in the core lan- 
guage. Because staged evaluation happens at runtime we can also 
bind this isomorphism specification with a runtime configuration 
framework (e.g. using dependency injection). 

6. Conclusion 

The potential of a new approach can be judged by theoretical gener- 
ality and practical simplicity. We described isomorphic specializa- 
tion for an enriched simply-typed lambda calculus extended with 
a very limited set of object oriented constructs just to capture the 
essence of the approach and simplify our presentation. But it is 
by no means limited by this language characteristics. Our imple- 
mentation works with polymorphic types and functions, supports 
inheritance hierarchies and multiple inheritance of abstract types. 

At the same time the behavior of specialization transformation 
is robust, predictable and very efficient in practice. It worked for all 
data types we used in an implementation of machine learning algo- 
rithms such as Logistic Regression and SVM, and it also shines in 
defining various representations of graphs and specializing abstract 
graph algorithms. 

Under not too strict and quite reasonable conditions, isomorphic 
specialization is automatic and provides strong guarantees which 
we formulated as a conjecture in Section [331 We didn't prove this 
statement formally, but we have a strong evidence supported by 
many examples that it always holds. 

Isomorphic specialization as a particular transformation is based 
on the machinery established by staged evaluation. This is a new 
formulation of staging which allows the use of term rewriting to 
simplify graph construction and transformations. The specializa- 
tion transformation presented in this paper is just one possible ap- 
plication of staged evaluation. 

We also showed that generic programming techniques together 
with staging can lead to a very simple yet powerful specialization 
method which can be made first-class in a high-level functional 
language. 

We gave a formalized description of isomorphic specialization 
algorithm and showed that it can be implemented as a set of simple 
rewriting rules over graph-based IRs. 

In our formalization we rely on acyclic graphs assuming that we 
deal only with non-recursive programs. This may sound like quite 
a limitation but in practice it greatly simplifies implementation and 
reasoning about it while still allowing us to support many practical 
data structures. This is the main limitation of presented algorithms 
but not the approach itself and we consider it as future research. 

Besides recursion, we have not covered many questions about 
formal properties of the presented methods and we didn't prove 
our conjecture. We presented staged evaluation for a call-by-value 
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language. Following the work of Danvy @], the technique can 
probably be made independent of the evaluation order. We rely 
on full reification of types but it may be interesting to further 
investigate type usage and characterize some minimal requirements 
regarding reification of types. This and similar questions are also 
directions of future research. 

Finally, we limit ourselves to pairs and binary sums of types 
in formalization and, in fact, in our prototype implementation. But 
extending to arbitrary tuples and tagged unions would be useful and 
could be done using e.g. the Shapeless 1 1] library for Scala. 
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